In this lesson

Why visualize?

The Datasaurus Dozen, by Justin Matejka and George Fitzmaurice

The Datasaurus Dozen, by Alberto Cairo

Plotspiration

Annotated plots, by Cédric Scherer

Formula one racing, by Georgios Karamanis

Generative Art, “Heartbleed” series by Danielle Navarro

animated graphics with the gganimate package

Load some packages

Read in data (in case you removed it)

## # A tibble: 6 x 10
##   Site     Date       Time  Treatment      pH Alkalinity   DOC   SO4  MeHg    Hg
##   <chr>    <date>     <chr> <ord>       <dbl>      <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Referen~ 2011-05-16 00:00 Pre-Treatm~  4.59      -21.3  4.93  2.63    NA    NA
## 2 Referen~ 2011-06-06 09:25 Pre-Treatm~  4.73      -11.4  3.72  3.24    NA    NA
## 3 Referen~ 2011-07-19 12:25 Pre-Treatm~  4.62      -14.2  3.3   3.59    NA    NA
## 4 Referen~ 2011-08-01 00:00 Pre-Treatm~  4.91       11.8  3.58  3.79    NA    NA
## 5 Referen~ 2011-10-07 00:00 Pre-Treatm~  4.68      -14.7  4.46  3.06    NA    NA
## 6 Referen~ 2011-10-31 00:00 Pre-Treatm~  4.78       -0.5  4.15  2.91    NA    NA
## Rows: 438
## Columns: 10
## $ Site       <chr> "Reference", "Reference", "Reference", "Reference", "Ref...
## $ Date       <date> 2011-05-16, 2011-06-06, 2011-07-19, 2011-08-01, 2011-10...
## $ Time       <chr> "00:00", "09:25", "12:25", "00:00", "00:00", "00:00", "0...
## $ Treatment  <ord> Pre-Treatment, Pre-Treatment, Pre-Treatment, Pre-Treatme...
## $ pH         <dbl> 4.59, 4.73, 4.62, 4.91, 4.68, 4.78, 4.51, 4.47, 4.45, 4....
## $ Alkalinity <dbl> -21.3, -11.4, -14.2, 11.8, -14.7, -0.5, -24.6, -32.0, -3...
## $ DOC        <dbl> 4.93, 3.72, 3.30, 3.58, 4.46, 4.15, 4.59, 4.51, 4.14, 3....
## $ SO4        <dbl> 2.63, 3.24, 3.59, 3.79, 3.06, 2.91, 2.88, 2.47, 2.62, 2....
## $ MeHg       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ Hg         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## # A tibble: 6 x 10
##   Site     Date       Time  Treatment      pH Alkalinity   DOC   SO4  MeHg    Hg
##   <chr>    <date>     <chr> <ord>       <dbl>      <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Referen~ 2011-05-16 00:00 Pre-Treatm~  4.59      -21.3  4.93  2.63    NA    NA
## 2 Referen~ 2011-06-06 09:25 Pre-Treatm~  4.73      -11.4  3.72  3.24    NA    NA
## 3 Referen~ 2011-07-19 12:25 Pre-Treatm~  4.62      -14.2  3.3   3.59    NA    NA
## 4 Referen~ 2011-08-01 00:00 Pre-Treatm~  4.91       11.8  3.58  3.79    NA    NA
## 5 Referen~ 2011-10-07 00:00 Pre-Treatm~  4.68      -14.7  4.46  3.06    NA    NA
## 6 Referen~ 2011-10-31 00:00 Pre-Treatm~  4.78       -0.5  4.15  2.91    NA    NA
## Rows: 2,628
## Columns: 6
## $ Site      <chr> "Reference", "Reference", "Reference", "Reference", "Refe...
## $ Date      <date> 2011-05-16, 2011-05-16, 2011-05-16, 2011-05-16, 2011-05-...
## $ Time      <chr> "00:00", "00:00", "00:00", "00:00", "00:00", "00:00", "09...
## $ Treatment <ord> Pre-Treatment, Pre-Treatment, Pre-Treatment, Pre-Treatmen...
## $ Name      <chr> "pH", "Alkalinity", "DOC", "SO4", "MeHg", "Hg", "pH", "Al...
## $ Value     <dbl> 4.59, -21.30, 4.93, 2.63, NA, NA, 4.73, -11.40, 3.72, 3.2...

Plotting in base R

A few straightforward types of plots are build in to base R. We can easily create scatterplots, boxplots, and histograms without much effort.

Visualizing missingness with the visdat package

What is ggplot2?

It is based on a “grammar of graphics” that uses layers to describe the components of a figure, including the data and the “non-data” pieces.

ggplot2 primarily builds plots using data frames, and it works well with other tidyverse packages.

One major distinction is that instead of pipes (%>%), ggplot2 commands are separated by ‘+’ signs. Think of this as “adding” information to the plot. The ordering of commands is also flexible, so we will look at this example to come up with some suggestions.

An example plot

Aesthetics with aes()

In ggplot2, aesthetics are the way we map columns of our data to different visual attributes of our plots.

We can use the levels, values, and relationships in our data to control

When we put things inside the aes() function, they automatically take on values from the data, rather than us manually setting them. (More on this later!)

Our first “ggplot”

Let’s make sure we’ve loaded the ggplot2 library and make a histogram of the Organic Carbon values using the DOC column inside the aes function.

Oops, what happened?? We didn’t specify a geom, so it didn’t know what type of plot to make!

Geoms, with geom_*()

The geom_ functions tell us what shape to represent our data in (points, lines, boxplots, etc.). They each also take more specific arguments for styling these shapes

geom_histogram

Histograms need a numeric variable on the x-axis.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 47 rows containing non-finite values (stat_bin).

## [1] 47

Quick aside: the default ggplot themes are a bit hard to look at, so I will be switching to theme_bw() for the rest of these plots. We will cover themes more later in the session.

We can customize some of the arguments in geom_histogram to make it easier to understand. For instance, let’s change the bin borders to white, the number of bins to 20, and add a title!

## Warning: Removed 47 rows containing non-finite values (stat_bin).

geom_point and geom_line

These are very useful for charts that have data across multiple axes - such as scatterplots and time series.

First we can make a scatterplot of the DOC values vs the Sulfate values.

## Warning: Removed 48 rows containing missing values (geom_point).

What if we want to change the color of all the points to blue? Is it this simple?

## Warning: Removed 48 rows containing missing values (geom_point).

Nope! For settings that are fixed, we need to move them outside of the aes function.

## Warning: Removed 48 rows containing missing values (geom_point).

Now let’s plot the Alkalinity values over time.

## Warning: Removed 5 rows containing missing values (geom_point).

What is this strange pattern going on? Let’s check the treatment periods.

## Warning: Removed 5 rows containing missing values (geom_point).

Interesting. What about the reference vs treatment locations?

## Warning: Removed 5 rows containing missing values (geom_point).

Let’s try switching this to a line chart instead.

What if we want to use both?

## Warning: Removed 5 rows containing missing values (geom_point).

geom_bar

Bar charts can take an x input for vertical bars or a y input to make horizontal bars.

## [1] 438
## # A tibble: 6 x 2
##   Name           n
##   <chr>      <int>
## 1 Alkalinity   438
## 2 DOC          438
## 3 Hg           438
## 4 MeHg         438
## 5 pH           438
## 6 SO4          438

## [1] FALSE  TRUE FALSE  TRUE  TRUE
## # A tibble: 6 x 2
##   Name           n
##   <chr>      <int>
## 1 Alkalinity   433
## 2 DOC          391
## 3 Hg            41
## 4 MeHg          36
## 5 pH           434
## 6 SO4          435

geom_density

This is a quick way to show a 1-dimensional density plot (which is like a continuous version of a histogram). Here the x-axis represents our Organic Carbon values, and the y-axis is their relative density.

## Warning: Removed 47 rows containing non-finite values (stat_density).

What if we want to split it up by treatment period?

## Warning: Removed 47 rows containing non-finite values (stat_density).

It is easier to see overlapping densities if we change their transparency, using the alpha argument.

## Warning: Removed 47 rows containing non-finite values (stat_density).

Note: There is also a geom_density_2d for bivariate densities and contour-type plots.

geom_boxplot

We can start with an overall boxplot of the Sulfate values. Let’s put the values on the y-axis.

## Warning: Removed 3 rows containing non-finite values (stat_boxplot).

This is not super interesting on its own. We can change the border of the boxes with color argument and the inside color of the box using the fill argument.

## Warning: Removed 3 rows containing non-finite values (stat_boxplot).

What if we break it up by site and add a fill color?

## Warning: Removed 3 rows containing non-finite values (stat_boxplot).

Naming plots and adding on to them

One common practice to save lines of code and re-running is to store our ggplots as objects. This is as simple as using the assignment operator (<-) and giving it a variable name.

Note: if we assign a plot like this, it will not show up unless we call that variable name!

Weird, but true: We can continue to add to our ggplots using the ‘+’ sign after we give them a variable name.

This is a helpful practice if you have a basic plot you want to customize in different ways, or for saving plots to files (later).

Labels and Legends

Often our column name isn’t the clearest way to describe our axes or legends.

We can set more helpful labels for any of our aesthetics with the labs() function. These can cover x and y axes, color, fill, shape, and even titles and subtitles.

Facets

Facets are a way to partition your data across create multiple plots at once, using a grouping variable. For example, if we want to have multiple analytes and want to plot their data separately.

## Warning: Removed 858 rows containing missing values (geom_point).

Often, we have variables with different value scales in our facets. When this is the case, we may want to specify that the axes can vary in each plot using

We can also change the arrangement of the facets using the ncol and nrow arguments.

## Warning: Removed 858 rows containing non-finite values (stat_density).

## Warning: Removed 858 rows containing missing values (geom_point).

Themes

Themes control all the components of a ggplot graphic that are not directly tied to the data. We can think of this as the “metadata” for a plot - background color, gridlines, font, text sizes, etc.

Here are some themes built in to ggplot2:

There are many other themes available in other R packages. Check out the ggthemes package for some more variety and professional graphs in the style of FiveThirtyEight, the Wall Street Journal, and the Economist!

Adjusting an Existing Theme

We can also add a theme() line to any ggplot and change the underlying metadata manually. For example, we can use this line of code to center our title. (PS, I have to google this every single time!)

## Warning: Removed 47 rows containing non-finite values (stat_density).

Note: when we add custom theme elements, it is important that those lines go after any built-in theme calls or they will get overwritten.

Scales

Scales control the range of values for a particular aesthetic. For example, a scale can adjust the x axis limits or apply a log-transformation. We can also use scales to set color palettes, sizes of points, etc.

There are many families of scale functions. Here, the * symbol to indicate aes values such as x, y, color, fill, shape, size.

General

transformations with trans argument

  • scale_x_continuous(trans = ‘log’)
  • scale_x_continuous(trans = ‘sqrt’)
## Warning: Removed 3 rows containing non-finite values (stat_bin).

This log-transformed data looks fairly normal! What if we wanted to add a normal distribution to it? The ggpubr package has a number of functions that make publication-quality features.

## Warning: Removed 3 rows containing non-finite values (stat_bin).
## Warning: Removed 3 rows containing non-finite values
## (stat_overlay_normal_density).

This suggests that our original

See this link for a complete list of the possible ways to use scale functions!

Coordinates

What if we wanted a vertical bar chart instead of a horizontal one?

What if we want our scatterplot axes to be clipped to some specific limits with no extra space?

## Warning: Removed 48 rows containing missing values (geom_point).

## Warning: Removed 48 rows containing missing values (geom_point).

Add reference lines

Sometimes we are interested in adding reference lines to provide context for our data. These can be done with

## Warning: Removed 858 rows containing missing values (geom_point).

Saving plots

Three common ways to export graphics.

  1. Export button

This is a quick way to save a plot to a file but not always repeatable.

Go to the Plots tab in your bottom right panel. Above the figure there is a Zoom, Export, X, and broom options. Click Export -> Save as Image. Set the file type, filename, and dimensions however you like.

  1. Open file -> make plot -> close file

This is a system to let you save any kind of plot (base or ggplot) to a file directly by setting up the file and filling it with the plot. This setup covers PNG, JPEG, SVG, and TIFF images, as well as PDFs.

  1. ggsave

The ggsave function is specific to ggplot graphics, but it lets you have similar control over their size and resolution, and output file type.

I recommend method 2 or 3 because they are more reproducible.